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Abstract — This paper investigates the linear precoder design 
that maximizes the average mutual information of multiple-input 
multiple-output channels with finite-alphabet inputs and statis- 
tical channel state information known at the transmitter. This 
linear precoder design is an important open problem and is ex- 
tremely difficult to solve: First, average mutual information lacks 
closed-form expression and involves complicated computations; 
Second, the optimization problem over precoder is nonconcave. 
This study explores the solution to this problem and provides the 
following contributions: 1) A closed-form lower bound of average 
mutual information is derived. It achieves asymptotic optimality 
at low and high signal-to-noise ratio regions and, with a constant 
shift, offers an accurate approximation to the average mutual 
information; 2) The optimal structure of the precoder is revealed, 
and a unified two-step iterative algorithm is proposed to solve 
this problem. Numerical examples show the convergence and the 
efficacy of the proposed algorithm. Compared to its conventional 
counterparts, the proposed linear precoding method provides a 
significant performance gain. 

I. Introduction 

The theoretic limit on the information rate that a commu- 
nication channel can support with arbitrary low probability 
of error is referred to as channel capacity. This capacity is 
achievable with independent Gaussian distributed inputs for 
parallel additive white Gaussian noise (AWGN) channels and 
with correlated Gaussian inputs for multiple-input multiple- 
output (MIMO) channels [1]. 

Even though Gaussian inputs are theoretically optimal, they 
are rarely realized in practice. Alternatively, inputs are usually 
taken from a finite-alphabet constellation set, such as phase 
shift keying (PSK) modulation and quadrature amplitude mod- 
ulation (QAM), which departs significantly from the Gaussian 
assumption. Therefore, there can be a big performance gap 
between the precoding schemes designed from the standpoint 
of finite-alphabet inputs and those designed with Gaussian- 
input assumptions. For example, in [2], the optimal power 
allocation for parallel Gaussian channels with finite-alphabet 
inputs is obtained; for the case of MIMO channels, the neces- 
sary condition satisfied by the optimal precoder is given in [3]. 
Optimization of the precoder using gradient-descent method is 
introduced in [4]; Optimization utilizing the structure of the 
optimal precoder is considered in [5]-[7]. 

The above-stated results and precoding algorithms hold 
when the transmitter is able to accurately track the in- 
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stantaneous channel state information (CSI). For fast fading 
channels, long-term channel statistics is more plausible to 
exploit because it varies with the antenna parameters and the 
surrounding environment and thus may change very slowly. 

This study explores the linear precoder that maximizes the 
average mutual information with statistical CSI. It starts by 
decomposing the precoder, by the singular value decompo- 
sition (SVD), into three components: left singular vectors, 
diagonal power allocation matrix, and right singular vectors, 
and proves the left singular vectors equal the eigenvectors 
of the transmit correlation matrix. Due to the prohibitive 
complexity of evaluating average mutual information, a closed- 
form lower bound is derived. Interestingly, with a constant 
shift, the lower bound function offers a very good approxima- 
tion to the average mutual information, and using the bound for 
precoder design achieves the optimality asymptotically in the 
low and high signal-to-noise ratio (SNR) regions. Therefore, 
this paper proposes to use the bound as an alternative and 
develops an iterative algorithm, based on convex optimization 
and optimization on the Stiefel manifold, to obtain good 
solutions to power allocation matrix and right singular vectors. 

Notation: Boldface uppercase letters denote matrices, bold- 
face lowercase letters denote column vectors, and italics denote 
scalars. The superscripts ( ) T and (-) H stand for transpose and 
Hermitian operations, respectively; [A]j ■ denotes the (i,j)-th 
element of matrix A; Tr (A) denotes the trace operation; I 
represent an identity matrix. E denotes statistical expectation, 
and C denotes the complex spaces. All logarithms are base 2. 

II. System Model and Preliminaries 

Consider a MIMO system over frequency flat fading with 
N t transmit antennas and N r receive antennas. Let x e C * 
be a transmit signal vector with zero mean and unit covariance; 
the receive signal y e C Nr is given by 

y = HPx + n (1) 

where H G C rXNt is a random channel matrix whose 
th entry is the complex propagation coefficient between the j- 
th transmit antenna and the i-th receive antenna; P G C NtXNt 
is a linear precoding matrix; n G C Nr is an independent and 
identically distributed (i.i.d.) zero-mean circularly-symmetric 
Gaussian noise with covariance er 2 I. 

For doubly correlated MIMO channels, the channel matrix 
H can be modeled as 

H = *y 2 H w * t 1/2 (2) 



where H v 



is a complex matrix with i.i.d. zero- 



mean and unit variance Gaussian entries; \& t e C NtXNt > 
and \& r G C NrXNr > 0, respectively, are transmit and receive 
correlation matrices known by transmitter. 

With the product of H and P known at the receiver and in- 
put signal drawn from the M-ary equiprobable finite-alphabet 
constellation, the average mutual information between x and 
y, X(x;y), is given by [3]: 

X(x;y) = E Hw X(x;y) (3) 

in which X (x; y) is the instantaneous mutual information: 



M ' 



X(x;y) = N t \ogM 



M Nt ^ 

m— 1 



En log CX P(- d m,fe) , 



k=l 

where d m . k = (||HPe mfe + n|| 2 - ||n|| 2 )/cr 2 , || • || denotes the 
Euclidean norm of a vector, and e m k = x m — Xfe. Both x m 
and Xfc contain N t symbols, taken from signal constellation. 

Considering the unitarily invariant of Euclidean norm, the 
following relationship can be identified: 

I(x;y)=I(x;Uy) and X (x; y) ^ X (Ux; y) (4) 

which implies that linear precoder, even a unitary one, may 
change the average mutual information of MIMO systems. 

The objective of this work is to develop efficient algorithm 
to find a linear precoding solution that maximizes the average 
mutual information in (3). The optimization is carried out over 
all possible N t x N t complex precoding matrices with transmit 
power constraint: 

maximize X (x; y) 
subject to Tr(PP H ) < N t . 



(5) 



Since X(x;y) is an increasing function of SNR (i.e., 
1 /a 2 ), the optimal precoder should use the maximum available 
power; that is, the inequality constraint can be replaced by the 
equality constraint: Tr(PP ff ) = Nt. 

The obstacles in the way to solve problem (5) are twofold. 
First, the closed-form objective function is lacking (see [4] 
for the case of instantaneous CSI); second, the optimization 
problem is nonconcave and extremely difficult even for some 
specific cases (see [5] for the case of instantaneous CSI 
with real-valued channels). The next section will explore the 
structure of the optimal precoding matrix and will provide a 
closed-form lower bound to the objective function. 

III. Optimal Precoding Structure and Average 
Mutual Information Bound 

From eigenvalue decomposition, the correlation matrices SE^t 
and SEv can be expressed as 

*t=U t S t Uf and * r = U r S r Uf (6) 



where U t and U r are unitary matrices whose columns are 
eigenvectors of \& t and * r , and S t and S r are diagonal 
matrices whose diagonal entries are the eigenvalues of *S? t 
and i& r , respectively. Applying the property in (4) and the 
fact that random matrices H w and H w = U^H w U t have 
the same statistics, the channel model (1) can be reduced to 



where H = S r 2 H w S t 2 U^, and y and n are the results when 
unitary transform is applied on y and n, respectively. 
Because maximizing X (x; y) based on the model (7) is 
equivalent to maximizing X(x;y) from the model (1), the 
sequel discussion is based on this simplified model. 

The instantaneous mutual information X (x; y) depends on 
P through M = P H H H HP (see [5], [6] for the case of real- 
valued channels and [7] for complex-valued channels), which 
is a function of the random matrix H w . The expectation taken 
for average over H w thus equals the expectation over M: 



X(x;y) = E fi ±(M) = E M X(M) 



(8) 



where X(M) emphasizes the dependence of instantaneous 
mutual information on M. Since X (x; y) is also a function 
of the random matrix M, the value of X (x; y) changes based 
on its probability density function (PDF) [8]: 

P (M) det(W)- Art det(S r )- Ar -det(M) Art - Ar - 



T Nr (N t ) 
x F^ (-W 
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(9) 



where t Nr (N t ) is related to 7Y r and N t ; W is 
P H U t S t U^P; oFq N *\-) is the hypergeometric function of 
two Hermitian matrices. 

From (9), the distribution of M is determined by constant 
parameters S r and W. Consider SVD of the precoding matrix 
P = UpSpVp , where Up and Vp are unitary matrices, and 
Sp contains nonnegative diagonal entries. 

Proposition 1: Given parameter W = P^UtStU^P of 
the distribution of random matrix M, the precoder in the form 
P = U t SpVp minimizes the transmit power Tr(PP H ). 
Proof: See Appendix 3.B in [9]. □ 

This result provides the design for the left singular vectors, 
which equal the eigenvectors of transmit correlation matrix 
\& t ; it simplifies the channel model (7) to 

y = S?H w s|px + n (10) 
and reduces the average mutual information function to 



X(x;y) = N t log M - N r / In 2 - 1/M 
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where P = S P Vp is the remaining part of P. However, it is 
still difficult, if not impossible, to evaluate the multiple integral 
numerically (for N r x N t MIMO channels, 2(N r N t + N r ) 
integrals from — oo to oo need to be considered). 

Proposition 2: The average mutual information of doubly 
correlated MIMO channels with finite-alphabet inputs can be 
lower bounded by 

1 L = N t \ogM -N r (l/ln2 - 1) - l/M Nt 

M N * M N * _j 

x E l0 S E II i 1 + ^e£ fe P"E t Pe rofe ) 

m=\ k=l q 

where r q denotes the q-th diagonal element of S r . 

Proof: This bound can be proved by using Jensen's 
inequality directly. Details are omitted here for brevity. □ 



IV. Precoder Design for Maximizing the Average 
Mutual Information 

This section starts by proving the asymptotic optimality 
of maximizing the lower bound and then develops a unified 
algorithm to do that. Based on Proposition 2, the problem of 
maximizing is equivalent to the following problem: 

M N t M N * _ 1 

minimize £ log £ J] (l + ^L e ^P H £ t Pe mfc ) 

m=\ k=l q 

subject to Tr(PP H ) = N t . 

(11) 

A. Asymptotic Optimality and Concavity of Lower Bound 

1 ) Asymptotic Optimality at Low SNR Region: When a 2 — > 
+oc, the objective function in (11) is expressed, based on 
Taylor expansion, as 



M N t M Nt 

£ log £ n + ^ e -P^P e -r 

m=\ k=l q 



= M Nt log M Nt Tr(Sr) 
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where 0(1/<t 4 ) denotes the least-significant terms on the order 
of 1/(T 4 . Since e^ fc P if S t Pe„ l fe is a scalar, it yields 



M ' M ' 

££ 



e^ fc P H S t Pe mfe - e rofe Tr (E t Ep) (12) 



where e mfe is a constant with e mfe I = \ im }_^ k e rnk e£ lk . 

Combining the optimal precoding structure in Proposition 1 
and the diagonal matrix Sp of maximizing (12) with power 
constraint, the solution of problem (11) at low SNR region is 
given by the following proposition: 

Proposition 3: The optimal precoder to maximize the lower 
bound at low SNR region equals U t times a diagonal power 
allocation matrix with all power allocated on the maximum 
singular value of * t ('•<?., the beamforming strategy). 

The result based on maximizing the lower bound presented 
here is the same as the result of maximizing average mutual 
information directly at the low SNR region (the latter can be 
derived by extending the analysis for the case of instantaneous 
CSI in [4]); that is, it is asymptotically optimal to maximize 
the lower bound at the low SNR region. 

2) Asymptotic Optimality at High SNR Region: The proof 
of asymptotic optimality at high SNR region starts by rewriting 
the objective function of (11) as 



£ log £ exp 

m=l k=l 



-£ ln (l + ^e r f lfc P ff S t Pe„ lfc ) 



(13) 



Note that log^fe=i ex P(') is a so ft version of maximization 
[11]. The idea here is to replace the soft maximization by its 



hard version and approximate it for high SNR region 
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Thus, the problem in (11), at high SNR, is equivalent to 
maximize min e^ fe P H S t Pe m fc 



,fc 

subject to Tr(PP ff ) 



(14) 



The minimization of the quadratic form in (14) can be identi- 
fied as the minimum distance among all possible realizations 
of the input vector 

: t 1/2 p(x„ 



mm | 

m,k 



Xfe) 



(15) 



which leads to the following results: 

Proposition 4: The optimal precoder to maximize the lower 
bound at high SNR region is equivalent to maximizing the 
minimum distance among all the constellation vectors. 

This result, maximizing the lower bound for high SNR, is 
the same as that of maximizing the average mutual information 
by extending [4]; that is, it is also asymptotically optimal to 
maximize the lower bound at the high SNR region. 

3) Concavity Results: Considering the low computational 
complexity and the asymptotic optimality, it is reasonable to 
apply the criterion of maximizing the lower bound. In order to 
develop efficient algorithm, concavity, guaranteeing the global 
optimality, needs to be verified. 

The first candidate is to identify the concavity of P. 
Unfortunately, it does not hold and can be verified by a 
counterexample (e.g., H 6 C lxl ). The next candidate is to 
identify the concavity over Sp. 

Proposition 5: The lower bound of the average mutual 
information is a concave function of A, A = Diag(Sp). 

Proof: This result can be proved by identifying the 



concavity over the diagonal elements of Sp in (13). 



□ 



B. Precoder Design 

The solution of P is separated into two parts: optimization 
of power allocation matrix and right singular vectors. 

Optimal Power Allocation: Given right singular vectors Vp, 
the first step is to optimize A: 



maximize 
subject to 



1 T A = N t 



(16) 



where 1 and denote the column vector with all entries one 
and zero, respectively. The concavity result in Proposition 5 
ensures that a global optimal solution can be found by either 
gradient-descent based method or Newton-type method [11]. 

Optimization Over Right Singular Vectors: Given a power 
allocation vector A, the maximization of lower bound boils 
down to the maximization over the right singular vectors: 



maximize 
subject to 



Z L (Vp) 
Vp'Vp = I. 



(17) 
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Fig. 1. Average mutual information with QPSK inputs and exponentially 
correlated (p t = 0.8, p r = 0.5) MIMO channels (N t = N r = 2) for the 
case of without precoding. 



Fig. 2. Probability density and cumulative distribution of the optimized 
average mutual information from different initialization points. The input 
signal is drawn from QPSK; MIMO channels (2 X 2) are correlated with 
pt = 0.8 and p r = 0.5; SNR is -5 dB. 



To solve the above unitary-matrix constrained problem, 
both gradient descent with projection and moving towards 
geodesies on Riemannian manifolds can be used [12]. 

Two-Step Approach to Optimize Precoder. A two-step ap- 
proach can now be developed by combining the design for left 
singular vectors in Proposition 1, power allocation vector in 
problem (16), and right singular vectors in problem (17). 

Algorithm: Two-Step Algorithm to Maximize the Lower 
Bound of Average Mutual Information Over Linear Precoder 

1) Initialization. Given feasible initial points A^ -* and 
Vp', and set n = 1. 

2) Design left singular vectors. Let Up = U t . 

3) Update power allocation matrix. Solve problem (16): 

A (n) =arg max I L (A, Vp" l_1) ). 

l T X=N t 

4) Update right singular vectors. Solve problem (17): 

Vi n) =arg max I L (A (n) .V P ). 
v£v P =i 

5) Iteration. Set n = n + 1, and go to Step 3 until 
convergence. 

The two-step algorithm, optimizing variables alternatively, 
converges to the globally optimum solution when the optimal 
right singular vectors are unique and the bound is concave 
on Vp. When this condition fails, the iterative algorithm 
converges to a local maximum, which may be affected by 
the initialization of the algorithm. However, we show, by 
numerical examples in the next section, that the different 
initializations have limited effect on solution; that is, the two- 
step algorithm achieves near-optimal performance. 

V. Simulation Results 

Examples are provided to illustrate the relationship between 
average mutual information and the derived bound and to show 
the convergence and the efficacy of the proposed algorithm. 
To exemplify our results, the exponential correlation model 

Mp)]ij=P li - jl , pe[0,i) 



with \P t = *&(Pt) an d *r = *&(Pr) is considered. 

1) Relationship between Average Mutual Information and 
Lower Bound: When the SNR approaches and +oo, the 
limits of the average mutual information in (3) are given by 
and N t log M. At the same time, the limits of the lower 
bound in Proposition 2 are, respectively, — iV r (l/ln(2) — 1) 
and A^logAf — iV r (l/lri(2) — 1), which imply a constant 
gap at low and high SNR regions between average mutual 
information and the lower bound. Since adding a constant 
value to the lower bound function remains the solution to the 
optimization problem (11) invariant, with a constant shift, the 
lower bound actually serves as a very good approximation. 
Figure 1 illustrates the derived lower bound, lower bound 
with a constant shift, AT r (l/ln(2) — 1), and the simulated 
average mutual information (by the Monte Carlo method 
via generating many realizations of H w and n, see (3) for 
formula). The lower bound with a shift and the simulated 
curve match exactly at low and high SNR regions and close to 
each other at medium SNR region. Further study verifies that 
this approximation is valid for various numbers of transmit 
and receive antennas and various input types and correlation 
parameters [10]. These results imply the precoder maximizing 
the lower bound can be a good solution. 

2) Convergence of the Two-Step Algorithm: The conver- 
gence of the algorithm is considered with different feasible 
initialization points. The initial power allocation vector A*- - 1 
is non-negative and satisfies sum power constraint, while the 
initial right singular vectors satisfy unitary constraint. 

As the lower bound is optimized iteratively by the proposed 
algorithm, the average mutual information is forced to im- 
prove. The probability density and cumulative distribution of 
average mutual information for the optimized linear precoder 
from different initialization points are depicted in Fig. 2, which 
is obtained by generating 300,000 uniform random initial 
power allocation matrices and right singular vectors. The curve 
of probability density implies the existence of multiple local 
optimum points, which verify the nonconcavity result over P 




5 
SNR (dB) 



Fig. 3. Average mutual information versus the SNR for different strategies. 
The input signal is drawn from QPSK; 2x2 MIMO channels are exponentially 
correlated with pt =0.8 and p r = 0.5. 



(see Sec. IV-A3). Although there is a likelihood to stop at 
a local optimum, the two-step algorithm, from an arbitrary 
initialization points, achieves average mutual information more 
than 1.535 bps/Hz, about 97% of the maximum capacity with 
Gaussian input [13] (1.583 bps/Hz, see Fig. 3 for reference). 
That is, the iterative algorithm obtains a satisfactory solution, 
even though the problem is nonconcave, and makes perfor- 
mance of MIMO systems with finite-alphabet inputs close to 
the maximum capacity with Gaussian inputs. 

3) Efficacy of the Linear Precoder: The performance of the 
proposed algorithm is shown in Fig. 3, which also includes 
several additional cases: no precoding for both QPSK inputs 
and Gaussian inputs, beamforming, maximum capacity [13], 
and maximum coding gain [14]. 

Although the precoding method of maximum capacity ob- 
tains gains when input signal is from Gaussian distribution, 
it results in a significant loss if applying the strategy to 
discrete inputs, especially at the medium to high SNR region. 
The reason for such performance comes from differences 
in designing the power allocation matrix and right singular 
vectors between finite-alphabet inputs and Gaussian inputs. 

Intuitively, in order to maximize capacity with Gaussian 
inputs, allocating more power to the stronger subchannels and 
less to the weaker subchannels is the solution. This design, 
however, is not optimal for finite-alphabet inputs because the 
average mutual information with finite inputs is bounded, and 
allocating more power to subchannels that close to saturation is 
not efficient. Moreover, the right singular vectors for Gaussian 
inputs is an arbitrary unitary matrix, while the case of finite 
inputs fails to follow the same rule, as shown in (4). 

The proposed precoding algorithm exploits the characteriza- 
tion of the optimal precoding structure and achieves a solution 
with the optimal left singular vectors, the optimal power 
allocation vector (given an arbitrary right singular vectors), 
and a local optimal right singular vectors. Since different 
initialization points have limited effect on solution (see Fig. 
2), the two-step algorithm guarantees to offer a significant 



gain from an arbitrary initialization point. For example, perfor- 
mance of the proposed algorithm is about 1.9 dB, 2.2 dB, and 
5.9 dB better than the maximum coding gain, no precoding, 
and maximum capacity method, respectively, when channel 
coding rate is 1/2. Moreover, when SNR is less than -2.5 dB, 
the proposed algorithm provides almost the same performance 
as the maximum capacity design with Gaussian inputs, which 
is the upper bound for all possible linear precoder. 

VI. Conclusion 

This paper has considered the linear precoding over MIMO 
channels with statistical CSI. Instead of assuming Gaussian 
inputs, theoretically optimal but rarely realized in practice, it 
has explored the framework to maximize the average mutual 
information with the constraint of finite-alphabet inputs, which 
has been known as an important open problem. The obstacles 
includes two aspects: First, the closed form of average mutual 
informations is lacking; second, the optimization problem over 
precoding matrix is nonconcave. Both obstacles have made the 
problem of finding a good solution extremely difficult. This 
study has exploited the structure of the optimal precoder and 
solved this problem by a unified two-step iterative algorithm. 
Numerical examples have demonstrated the convergence and 
performance of the proposed algorithm. Compared to its 
conventional counterparts, the linear precoding method can 
provide a significant performance gain. 
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